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ABSTEACT 

This document discusses social survey research in 
vhich the need for identification of respondents may bring social 
research into conflict vith the lav and social custom* The paper 
-deals vith two features of the conf lict-- the products of such ' 
research, and the way in vhlqh privacy of the respondent can be 
assured regardless of the product. The paper is divided into five 
partss (1) Introduction; (2) Longitudinal Inquires Its Definition, 
Justification 1^ and Bearing on Record Linkage; (3) Correlational 
Besearch: Definitions, Justification, and Belevance to Becord 
Linkage; (4) Privacy Implications: Private with Bespect to Whom?; and 
(5) Competing and Conjoint Approaches to Assuring Confidentiality of 

Besponse in Social Besearch. / (Author/JLL) 
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1.. Introduction 



In social survey research, the respondent's identification ordinarily 
serves as an accounting device. Both identification and associated data 
are maintained under the proviso that they will be used only for research 
purposes and, in particular, will not be used to make personal judgecients 
about identified individuals. Despite the proviso, the need for identifiers 
may bring social research into sharp conflict with law and social custora. 
This paper deals with two features of the conflict — the products of such 
research and the way in which privacy of the respondent can be assured 
regardless of the product. 

In particular. Parts 2 and 3 concern longitudinal and correlation 
research in which identifiers are normally deemed essential* There is' a 
special emphasis on the practical consequences, including loss and dis-* 
tortion of information, engendered by thoughtless abridgement of one's 
ability to track individuals over time* And because the social benefits , 
of the research will often clearly offset the privacy depreciation effects, 
there is a special emphasis on benefits of the research product. Our 
illustrations are taken from medical studies, economics, edacation, 
psychology, and sociology. * 

Part 4 briefly covers a topric which is already familiar to some of 
you — the privacy problems implied by social research efforts in general, 
and by the illustrations in particular. In Part 5, some general strategies 
•for resolving problems are laid out, together with a few examples of theiif 
application. Here too the discussion is brief .but broad in its coverage 
, of procedural, statistical, and law-based solutions to the problems. The 
main theme here^is minimizing degradation of privacy without preventing 
good research designed to better understand human behavior. 

2. Longitudinal Inquiry; Its Definition, Justification, 
and Bearing on Record Linkage 

Longitudinal research refers here to the process of tracking a group 
of individuals oyer time to establish how the state of that group varies 
and, more importantly, to establish the average relation between an 
Individual's state at one point .in time and his state at some other time. 
For example, one may conduct a study of adults* to learn not only how 
health status of the group changes with age, but also to understand how 
the individual's health at one age is correlated with status at a later 
age. Obtaining an accurate characterization of this sort is necessary - 
for describing and predicting health status. And/ir. is crucial for the 
more demanding task of explaining the bio-social^mechanisms which underlie 
health status development. 

Usually, this methodology requires that an observation on a person 
at a particular time.be linked with observations made on that person at 
subsequent times, for each person in a sample. The vehicle for linkage 
is typically, though not always, the individual's identification. The 
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linkage implies some degradation of privacy, and* so it behooves us to ask" 
why such research is justified, to ask what we can learn or have learned 
from such research. 

In the following remarks, some evidence bearing on these questions 
is presented. Section 2.1 covers some of the logical traps in which we 
can easily be ensnared if we choose not to do longitudinal research. 
Sections 2,2 and 2.5 consider a few discrete examples of longitudinal 
research and their products. 

2.1 Traps, Artifacts, and Circularity 

One of the simplest ways to illustrate why longitudinal data may be 
essential for even primitive understanding is to compare it with a 
(ostensibly equivalent but) less demanding mode of - data collection. Cross- 
sectional studies, for example, have been suggested as a way of learning 
as much about human^ behavior as longitudinal investigations. Andbecause 
they involve observation of a large sample at only one point in time, they 
are said to degrade privacy to a lesser degree thar the longitudinal 
approach. 

* 

Consider, for example, the problem of -understanding how intelligence 
(or certain intellectual achievement) varies with age. One might conduct 
a survey of a sample of childr^n-of age 3, say, and then continue to 
survey those individuals annually until they reach an advanced age. Or, 
ill the interest of saving time and perhaps on privacy grounds, we might 
choose to conduct a single survey of a representative sample, of (anonymous) 
J-year-olds, a sample of 4-year-olds, and so forth at only one point in 
time, under the assumption that this cross-sectional survey would vield 
roughly the same results as the longitudinal survey. This last assumption, 
that a growth curve based on longitudinal data will be roughly equivalent 
to a growth curve based on cross-sectional data, is critical. 

The assumption also happens to be wrong with alarming frequency. In 
particular, its espousal by some human-development experts has led to. some 
erroneous, not to say embarrassing, folklore about the development of 
human intelligence. The' same assumption has been a trap in some economic 
welfare research, in some epidemiological work, and in other areas. 

To understand one of the logical traps here, consider Figure la, a 
chart commonly used during the 1940's and 1950's to illustrate the gradual 
increase in IQ from childhood to early adulthood, and'the gradual decrease 
thereafter. The implication of the graph, which is based on actual cross- . 
sectional data, is that at age 30 one's IQ is at its peak, and things go 
downhill soon after that. What makes the chart much more persuasive is 
th^t similar inverted-U patterns show up in other investigations of human' 
ability based on cross-sectional data. This includes the quality of 
treatises written by eminent philosophers (rated by eminent philosophers) 
when plotted against the age at which the author. wrote the document. And 
it includes the level of innovativeness of theory and inyention of chemists 
when plotted against the chemist's age at the theory*s production, and 
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Figure 1. Examples illustrating the confounding of age and cohort 
differences in cross-sectional research. Source: J. R.. Nesselrcade and 
Paul B. Baltes, Adolescent personality development and historical changei 
1970-1972. Monographs of the Society for Research in Child Development 
1974, 39, iXf Serial No. 154), page 4. 
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similar data (see, for example, Birren, 1964). 

Suppose now that instead of the cross-sectlonal data, there existed i 
longitudinal data on exactly the same individuals. The dotted lin^ in \ 
Figure lb illustrate how actual IQ may increase consistently with age 
without a notable decline, and how the rate of increase can depend on 
year of birth, i.e., on cohort. The points connected by the solid line 
correspond exactly to what appears in the plot of cross-sectional data. 
The chart suggests that individuals bom in 1910 increase in intelligence 
as they grow older. But their rate of increase is lower than the corre- 
sponding rate for a younger cohort, e.g. individuals bom in 1930. The 
reasons for differences in development rate, or "cohort effects," are a 
matter of speculation. They may involve any number of bio-social factors; 
the differences may even be an artifact of the increasing reliability or 
culture-relatedness of such tests. Regardless of the reasons, the point 
is that, the longitudinal data offer us a less misleading picture of human 
development than the cross-sectional'^ata. Moreover, the theory generated 
by the fomer will differ markedly from theory generated by the latter. 
It is clear that relying solely on the cross-sectional data can lead one 
to a conclusion which is quite contrary to the way nature behaves. In 
point of fact, there is-^reliable evidence from studies by Barton et al. 
(1975), Schaie (1965), and others that Figure lb is a more realistic 
portrayal of nature than Figure la. 

Exactly the same inferential problems occur for a variety of 
physical and social measures of individual characteristics. 
Plots of height , for example, when plotted against age, often 
show an inverted-U pattern if based on cross-sectional data, 
simply because rates of growth and upper limits on growth 
are quite high for children recently born, relative to the 
growth rate and upper limits for those bom 80 years ago. 
Plots of 'cross-sectional data on level of extroversion and 
age of adolescents in certain areas of the United States 
make it appear that^ extroversion declines through adolescence 
when it actually increases on the average and increases most 
qui(;kly for recently; born cohorts. Longitudinal data on 
adolescent tough mindedness (autonomy, assertiveness) 
suggests a fair degree of stability over ages 12-15? more 
recently bom cohorts generally:^ exhibit higher levels of^ 
trait. But cross-sectional data show a declining trend. 

Some of you may regard "soft" social data, like psychologic.il measures, 
as particularly susceptible to the inferential trap just described. 'The 
fact is that even data on "hard" social variables, such as income, are 
no less inpune to the problem. Consider, for example, estimates of life-, 
time income for individuals. These predictions are important in the 
commercial arena, e.g. in some credit and loan research in the insurance 
business. And they are no less important in the goveminent sector , e. g. ,* 
in planning social security benefits and the like. Often there is a 
choice between using cross-sectional data or using longitudinal research, 
and if both provide equally accurate estimates, then one might choose. 
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the crcssrsectional approach or managerial reasons or on grounds that a 
cross-sectional survey involves less degradation of the privacy of . 
individuals since one can presumably elicit anonymous responses. Miller 
andHomseth's (1970) attempts to estimate lifetime income- for certain 
segments of the population, is interesting in this respect. 

That estimates of lifetime income based on the two kinds of 
data will not be the same is clear from Tables 1 and _ 
TableTl, based entirely dh cross-sectional survey, suggests 
that annual income increases up until age 35, stabilizes ~ 
during the 35-54 year age interval, then declines • The 
pattetn is similar whether one considers data collected in 
1947, or 19^, or 1949. Table 2, on the other hand, is " 
based on longitudinal data and illustrates a much less 
... drastic pattern, notably that increases in income persist 
over a wider ag^ range, and rates of increase are substan- 
tial. The longitudinal data are, of course, affected by 
inflation and other factors uniquely^, associated with a 
given cohort, but similar patterns occur after adjustment 
for inflation. They arc more accurate than the cross- 
sectional data in the crude sense that they better describe 
the way observable income behaves as a function of age. 

Though the example is recent, the problem of estimating lifetime- 
earnings from cross-sectional data is not. a new one for economists. 
Klevmarken (1972) gives a tidy and brief description of the history of 
the problem in this context and points out practical needs for better 
estimates in labor negotiation, actuarial sciences, and elsewhere^ More 
important, he Jias managed to show, using both lox^jgitudinal and cross- 
sectional dataf how one could develop less misleading models of lifetime 
income curves If one had available only the cross-sectional data. He 
makes the same point as we do, however, in" observing that there is.no 
generally reliable way to 'ferS^^aJjf'lish longitudinal trends from cross- 
sectional <jata alone. Any attempt to do so must be based on assumptions 
which, for the social scientist, may easily fail to be^^met in reality. 

A different but no less important trap is the failure to recognize 
that longitudinal rather than cross-sectional data may be essential for 
detecting subtle influences on human behavior. The problem of designing, 
precise investigations is particularly important in estimating the; impact 
of social programs whose effects, we know, are often weak but may"^ none- 
theless be politically important. Achieving that objective often depend^ 
on the availability of longitudinal data. There is a large array oi^ 
analytic techniques, for example, which employ the correlation. /between.: 
behaviors at different points in. time to expunge irrelevant variationr 
from the data. The use of longitudinal research techniques*, especially 
in conjunction with randomized experiments, usually makes it easiet to 
detect influences which might otherwise be obscured by the normally high 
variation in human behavior, s 

Consider, for example, the Call. Colombia experiments on 
the impact of nutritional supplements on children's physical 



Table 1 



Estimates of Mean Annual Income in Dollars for Age 25 through 
64, Based on a Cross-Section of ^n Sampled in 1947, a Cross- 
Section Sampled in 1948, and a Cross-Section Sampled in 1949 



Year/Age 




25-34 


35-44 


45-54 


55-64 


1947 




2704 


3344 


3329 


2795 


1948 




2898 


- 3508 


3378 


2946 


1949 




2842 


: 3281 


3331 


2777 



Adapted from data presented by Miller and Homseth (1970). 



Table 2 



Estimates of Mean Income In Dollars over 10-Year Intervals 
for Six Cohorts of Individuals 



Year/Age 



1. 1947 

2. 1948 

3. 1949 



Year/Age 



4. 
5. 
6. 



1947 
1948 
i949 



25-34 



2704 (1947) 
2898 (1948) 
2842 (1949) 



35-44 



3344 (1947) 
3508 (1948) 
3281 (1949) 



35-44 



5300 (1957) 
5433-(-1958) 
5926 (1959) 



45-54 



5227 (1957) 
5345 (1958) 
5587 (1959) 



45-54 



8342 (1967) 
8967 (1968) 
9873 (1969) 



55-64 



7004 (1967) 
7828 (1968) 
8405 (1969) 



Note . Each cohort" has been surveyed every 10 years, 



The first cohort, for example, contains individuals who 
were 25. jj^ years of age in 1947 and had an average of - 
$2,704; in 1967, when they were "45-54 years of age, their 
mean income was $8,342. Adapted from Miller and Homseth 
(1970). 
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growth. Special nutritional supplements were assigned 
randomly to a sample of malnourished children; supplements, 
which were in short supply, were unavailable to an otherwise 
. . equivalent sample of copqjarison group children. The impact 
of the supplements was not evident from scrutiny of mean 
changes in treated and untreated groups; the simple. natural 
variation in heights of even malnourished children is 
sufficiently large to obscure real differences. More 
sophisticated analyses, using correlations between repeated 
measures of height of the children, did .yield estimates of 
program effect which differed notably from chance level. 
As a consequence of the positive finding, the supplements 
are being improved, put into local production, and tested 
on a much larger scale in three other less well developed 
countries. (Beiar, 1975; Sinesterra, McKay, & May, 1971) 

The same use of a longitudinal approach for the sake of sensitive 
analysis ofi program effects is evident in other areas. Heber et al. (1972), 
for example, have conducted 6-year studies to determine the relative impact 
of special programs for reducing the risk of functional retardation among 
infants and young children; based on these Wisconsin .pilot tests, similar 
test programs are being mounted in North Carolina and elsewhere. Beyond 
the midpoint in Kaiser Permanente's 10-year experiments, Ramcharan et al. 
(1973) find evidence for the impact of multiphasic screening on prevention 
of disease, an impact which is bound to bfe negligible during the first few 
years of the program. In the economic j area, the Housing Allowance Experi- 
ments require 8-12-year followups to determine incremental benefits of 
income subsidy plans on the poor, and to provide information for effective 
legislation in the area. In these cases and in innumerable others (see 
Riecken et al., 1974), the effects may be undetectable in the short run, 
and difficult to detect in the long run, especially if the groups, involved 
are quite ^small. There is simply no reliable substitute for longitudinal 
followups in these instances. 

The final logical, trap of interest here bears on both longitudinal 
and correlational research; it involves the analysis of data based on 
aggregate of individuals in order to make judgements about individuals 
within the groups. To establish the average relation between literacy 
and r^ce in the United States, for example, one might obtain published 
census statistics on the proportion of literate persons and the percentage 
of Negroes for each of 48 states and then compute the correlation 
between the two variables. Aggregated data might be used here on grounds 
that the relevant information is easily accessible from published tables. 
Or, we might justify our action on grounds that the use of published data 
does not present the privacy- related problems which might be engendered 
by a special survey. ^ 

There are two weaknesses implicit in the argument that aggregate 
data can be used, in lieu of individual data. The more obvious one is that 
inferences made about groups are not necessarily appropriate to the 
individual and in fact may be quite inaccurate. The second weakness, more 
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a matter of precision than accuracy, is that a n alyses based on grouped 
data are often considerably -less likely to reflect changes in individuals 
than analyses based on data at the individual level. 

To be more specif ic, 'consXaer the literacy-race example". At a 
particular point in time the correlation between literacy rate (percent 
literate) and color (percent Negro) computed on the basis of the nine 
census regions of the United States is .95- When individuals are grouped 
by state rather than region, the correlation is ^77. /Finally, when 
individuals are not grouped at all, but the entire disaggregated popula- 
tion is considered, the correlation is .20. (The example is from Robinson's 
fine paper (1950) on census data prior to 1950.) A similar pr blem with 
a different resolution appears if we try to determine the rela ..ion between 
color (whlte-nonwhite) and occupation- (domestic service — other) for female 
employees in Chicago in 1940. Though a correlation based on percentage 
data for each of nine areas is .34, the actual correlation based on 
individuals- is .29, not too different from the area-based estimate (se« 
Duncan & Davis, 1953; Goodman, 1953); . * . 

In the literacy-race example, the high correlation obtained Irom 
the regional data might be interpreted as suggesting that illiteracy is 
pervasive among blacks, and furthermore, that a massive program of educa- 
tion must be put into effect to counteract the problem. In point of fact, 
if we look at individuals' data, rather than at data based on opportunistic 
groups into which individuals may fall, we reach a considerably less 
pessimistic and a more accurate conclusion: that the relation between 
race and literacy was small but notable. / Any attempt to resolve the 
problem of* illiteracy by making a massive investment in rehabilitating 
the reading skills of each individual based on the .95 regional correla- 
tion, is bound to be a wasteful allocation of scarce resources. 

An obvious problem in these matters is the use of aggregates of 
individuals as a surrogate for individual persons. Since the aggregates 
are usually constructed for political or administrative purposes (e.g. 
census regions, health care service regions), it is unlikely that these 
"natural" aggregates will constitute valid replicas of real persons^ 
We have only a little theory to guide us in selection of "proper" aggre- 
gates. And it is impossible to predict whether an aggregate will be 
proper without some data at the individual level. 

The iproblem is a chronic one in the social and administrative 
sciences which must rely heavily on aggregated data — sociology, 
epidemiology, economics, statistical geography. It is particularly 
crucial in attempts to evaluate the impact on national social programs 
on individuals. Many such evaluations, in education for example, rely 
on data aggregated at the school district level to estimate the impact 
of a nationally^supported compensatory reading program for disadvantaged 
youth. The inferences m^de about individuals (based on analysis of 
aggregates rather than on individuals) are generally biased in an unknown 
fashion (the individual data not having been analyzed), and are imprecise^ 
because the aggregate data are insensitive to changes, even some marked 
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changes, in individuals (see Bursteln, 1976, for examples). This is not 
tO; say that the aggregation problem will always yield biased estimates. 
It is to say that the problems arie crucial dnd cannot be resolved unequi- 
vocally without /Some evidence based on individual rather than aggregated 
data.. 

'2.2 Medical Rer ^ 

There is a lou of longitudinal studies in dical research, 

dating at least from Uippo^rratesf's efforts to characterize the progressive 
stages of disease among his own and Kis colleagues' patients (King, 1971). 
The systematic tracking of both , the healthy and the ill remains a basic 
weapon in'^ medical research armamentarium. At its best, the approach not 
only^^Jhelp's to Identify the existence and incidence of a disease entity-* to 
determine symptom development and disease consequence,' but it is essential 
in^ laying out the array of possible progenitors of the disease. Longitudinal 
methods in this sector have become considerably more efficient over the 
last 40 years with the development of survey sampling technology. And 
when coupled to other methods, such as randomized experimental tests, the 
approach can be dramatic in identif yjLnig whether- and how well particular 
treatment programs work. ^ " /. 

^ Examples of the process are not hard to findv-:;^^ for the sakel of 
detail, suppose we. examine a 

gifted science writers (such as Gilmore, il?73)Vand researchers (such as • 
Kannel et a^., 1961), is among the' best ^dociimented. Modern work on 
cbrohary heart disease appears to have reached a turning point during the 
-1940s and 1950s with autopsy. studies<^ - Those investigations, because of 
ttieiir' small size and crOss-sectional nature, pJrpvided thin support for the 
linkage among' natural development of arteriosclerosis, heart disease, and^' 
:bi6-physical conditions (blood pressure, etc.), and more importantly, provided 
the evidence "necess^ary to justify longer term longitudinal study of the 
problem. The Framingham- Study (Kannel et al. , 1961) , among the largest 
of subsequent efforts, was designed to better establish relations between 
Ipripr condition and subsequent death dye to heart attack. Spanning 25 
^years in the lives of 9000 men, the effort was of sufficient size and 
duration to permit coiaputation of risk factors operating in the popula- 
.tlon: Actuarial tables were developed to illustrate the likelihood df 
l^eart attack as .a function of earlier serum cholesterol level, blood 
;.pressure, EKG abnormalities, and so forth. Other studies— animal experi- 
ments and' comparative investigations of ' populations /with natural diff erences ' 
in these factor s--yielded evidence which added to speculation about, the 
role. of serumxcholesterol. level and other factors in heart disease. 
Because the ability to describe and predict based on longitudinal does 
not riecessarily--yield. unequivocial^ i^ on causes of the disease, 

and because study of ^liunian population yields, results which are similarly 



ambiguous, long-term, exp^erimental tests of alternative treatment programs \ 
have been mounted. The best^of tlhose tests Vgenerally involve large ^ 
samples tracked over long time periods and, moreover, randomized assign- 
ment of individuals to one- of the competing treatments. As a consequence, 
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they raise problems more serious than those engendered by longltudinai :| 
research alone. Nonetheless, pilot efforts, such the Diet Heart I 
Feasibility Study, have bfeen completed /to furnish data on the practlcal\ 
difficulty of field tests and somewhat less equivocal small-scale data 
on the Impact of ^dlet control on heart disease. Such short-term (two- . 
Vear) studies have payed the way for longer term studies which focus' on 
the more plausible and^tractable causal mechanisms, notably reduction of 
heart disease through Idl^t" or drms which reduce serum-cholesterol levels. 
The largest of curfen^/clinlcal als will run five years and Involves 
over 50 institutions and 8000 pat:Lents; it is designed to evaluate ty 
effectiveness^f alternative di .. and drug dosage level for reducing 

' choleaterol^level la the bloodstream (Coronary Drug Project Research Group, 
1973). /Although th(B primary response variables are mortality rate /due 
to heart disease art'd related illness,, a variety of social, biological^,^.;^ ,.. 
Ind^physio logical measures are being obtained. The social measures-- 
,sttoking habits i lifestyle measures, race,, job characteristics, and so on , 
/--are expected 'to add precision to results and to help identify variables 

/ which though influential are less amenable to, direct control. , . 

The products of earlier longitudinal studies coupled to experimental 
tests are readily accessible (see Boruch & RieCken-, 1975; Riecken et al. , ; 
1974, ar/d references cited therein) . Long-term followup of released 
prisoners who have had cosmetic -surgery to remedy^facl^l dlBfl^^^ 

\ has given us evidence for lower re cidl^ amohg^prisoners so treated . 

\ Lohgitudinal experiments on the effectiveness -of physician surrogates- 
nurse practitioners, physician extenders—has yielded: i^^^ 
for reducing costs of medical service, for .planning innovative programs 
. In health' carfe utilization, and the like. A new generatipn ot pharmacy \ 
resear^ch focuses on both short-term and long-t^rm' drug- taking behavior^^^ 
determining level of patient compliance with medication regimens, deter-, 
mining how special packaging of] medidation influences compliance especially'^ 
among the elderly, and so on. In preventive medicine, tests of the impact 
.of multiphasic screening by Kaiker-Permanerite, are bfeing run for a 10-year 
period to assure that long-term effects of annxaal screening on detection 
and amelioration of disease are well documented. - ; , 

In brief, these examples and others like them teach us that there . 
is no way to establish etiology of disease o^ to evaluate the effective- 
ness of prevention and treatment programs without longitudinal study. 
Not that;longitudinal research is sufficient. Its natural limitations 
must usually be broadened by .coupling this approach- to others , 'notably 
experiments-, designed /to establish cause-effect relations; But the idea 
is central to medicay research and, in principle and in practice, 
generalizable to otherareas. ^ . 

. ■ f ' ' ■ ' ' ■ ■ _ . " , ■ , • 

2.3 Psychology and Psychiatry: Biochemical Bases ^^of Schizophrenia 

For fhe past 100 years, the scientific and lay arguments over the 
■ causes of schizophrenia have been supported largely by ambiguous data. 

The information at its worst has been unreliable and no more than anecdotal 
in form; at its best it .has been based, on longitudinal study of very: small 
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numbers of Individuals and .heavily reliant on retrospective reports of 
unknowable reliability. The debate's focus has changed markedly during 
the years, however. In part because of longitudinal research which depends 
heavily on record linkage (Mednlck & McNeil, 19'68; Mednick, Schulsinger , 
& Garfinkel, 1975) • ' ' 



One of ^fche basic problems in discovering the origins of schizophrenia, 
as many of y^ou know, is to disentangle the biochemical causes of the prob- 
lem .from the environmental influences. To resolve the problem, researchers 
at Denmark* ^lew School for Social Research, at the Psykologisk Institute 
(Copenha*^ and at the Komraune Hospitalet (also Copenhagen) have 
condui^ id 1l .dinal studies of over 4000 adopted children to discover 
how inc ifinnr schizophrenia among them varies with occurrence of 

schizophreua . in their natural families and in their adopted families. 
If, for example, the schizophrenia among children, born of schizophrenic'' 
parents but raised by adopte^d nonschizophrenic parents is high, then one 
has mbre reason to believe that the malady's origin has a genetic component 
Schulsinger*s. findings, obtained iri' collaboration wltli David Rosenthal, 
,Seymour Ke.ty', aiid Paul Wender of. the United States are. that: 

The incidence of schizophrenia is substantially higher ^ 
' . among adopted children who had schizophrenic pai:ents^ 
than among adopted children who^se-foster parents were 
"schizophrenic; ^^^^"^ 



• There is a very low incidence of schizophrenia among 
children born of parents who -are not schizophrenic. 

. If children whose natural parents are hot schizophrenic 
are later adopted by a schizophrenic foster parent, there 
is no Increase in linkelihood that i.the child will become 
schizophrenic, , 

This information is an elementary but important step in establishing the 
credibility of the idea that the origins of s^chizophrenia 'are partly 
environmental arid partly genetic, and it is important in directing atten- 
tion to more fertile areas of research. The latter include careful / 
studies of the possible genetic mechanisms and of the irple played by 
certain enzymes (for example) which may produce a predisposition toward 
schizophrenic behavior, - 

These findings could hot have been made without longitudinal data 
on adopted children and their natural and adopted parents, and without 
the crucial linkage among existing medical records, social service records, 
and follow-up data collected more recently on the.^basis of the national 
address registry, 

■ ; • ' . 

/ . ■ ■ . • 

2.4 Longitudinal Study in Manpower Economics ^ 

In human resources research, good evidence for the usefulne.ss of 
longitudinal data has been scanty in part because' the relevant data have 



been In short supply. The recent buildup of longitudinal files has 
^helped greatly to understand the data^s benefits and limitations, however. 
Of particular interest are the National Longitudinal Survey,^ (NLS) of 
the U.S. labor market, begun in 1966 by Herbert S. Fames (19/3). Those 
data are based on repeated surveys of a national probability sample of 
20,000 individuals in'^four labor market strata: middle-aged men (45-59 
yeats old at the survey's beginning), women (30-44 in 1966), young men 
and young wome,n (14-24 in 1966). The resultant data are being updated 
periodically ind, stripped .of identifiers, are being made available to 
the community of manpower researchers. Aside from their obvious benefits 
for temporal description of the labor market, the data can be very infor- 
matlvr iccount of tfieir, longit^-Hnal feature. Fames (1975) maintains 

Ferhaps the single most important conti;ibution of longitudinal, 
data is that they facilitate the identification of causal' 
relationships that cannot confidently be identified in any 
other way. Take, for example, the relationship between^ ^ 
attitudes /and behavior. In cross-sectional data, such 
relationships, are ambiguous, since one cannot be. certain 
'i.iWHether the attitude produces or reflects the. behavior ; Does 
' 'job dissatisfaction lead to turnover, or does an association 
between the variables simply mean that individuals who quit 
jobs are likely to rationalize their behavior by reporting 
(retrospectively) that they were unhappy? When attitudes.^ 
.measured at one point "in time can be related' to subsequent 
behavior, such ambiguity disappears. The NLS data for 
middle-aged men have clearly demonstrated that the degree of 
job satisfaction predicts tfhe liV lihood of sl voluntary job 
separation and that a commitment o vork in general, as well, 
as satisfaction with one's part ic ar job, decreases the 
likelihood of early retirement. 

The usefulness of longitudinal dr a in clarifying causal 
relationships is, of course, not confined to instances in 
which one of the variables Is attitudinal. For ^example, 
finding that the receipt of training, by middle-aged men 
between 1966 and 1971 was aosociated . with a netj earnings 
, advantage in 1971 (controlling for such other factors as 
education, hfealth, and region of residence) Avtil Adams \ 
went on to ddpionstrate that the trainees-to-be| had already 
enjoyed higher earnings in 1966 <again controljlling for the . 
same variables)./ Thus training was found to te a selective 
processj pxesumably attracting the more higlily motivated' 
or otiherwisa more productive individuals. T^ put the 
matter differently, some part of what would doubtless have 
been identified by a cross-s^ectional analysis as training's 

\ contribution to earnings was found to have reflected an 
incompletely specified model~i.e., the failure to control 
adequately for factors associated both with earning and the 

/ probability of receiving training. (Parnes, 1975, pp. 246- 

■ ^'.-:2-4-?0- 



Professor Fames Is optimistic about the fruits of research based on his 
data flies. We do not share that optimism, since longitudinal data alone 
Is often Insufficient to make unequivocal judgements about the Impact of 
manpower training programs. We do agree that such data are essential for 
bettfer understanding and prediction of gross labor market behavior, for 
establishing tentative hypotheses which can be later verified using more 
controlled studies, and for prediction. 

•2.5 Longitudinal Study in Child Measurement 

' ■. ' , , ■*').. ' • 

The National Child Df-velopment Study (NCOS) ; began in 1958 with a 
sur^vey of some 5,000 pregnant women. Its main objective (like the United 
Kingdom's earlier study, the 1946 Population Investigation Coimnittee 
survey) is t-o establish the linkages among prenatal conditions.,' ' 
envlroniiK atal factors, and growth of young children. According to Wall, 
the results of 1966 followup data on over 9300 chil^ir^n showed, / despit/ ' 
. strong suspicion to the contrary, the following variables are not singly 
predictive of lowered reading. ability: / maternal hjprt^nsion,. breech . 
presentation or forceps ^delivery, Caesarean sect ion\i% Further , J there^s ' 
an unexpected and strong relation between departure from normal gestation 
and reading and social adjustment test: scores at age 7; gestational, 
maturity is a better predictor than the more commonly accepted !birth7 
Weight measuremenc^ 

The Populatiom Investigation Committee Stiidy yielded 'other/ conclu- 
.^ions which could not have been> reached without longitudinal data.' Taken 
v:e:rbatini from Wall and Williams (1970,. pp. 42-43), we have: 

That the' effects of social mobility ai 1 increasing material 
fyitv)sperity have differential effects according to the 
enliicational levels of the parents and. the number of children 
In the family. ' 

That a relatively poor social environment is cumulative in 
ics effect on children* s. height, girls being more sensitive 
to this than boys. 

That separation from mother, as well as being :: much : mote 

prevaleiit than /had been thati^ht, seems (in the' period f rom 

birth to five years) to provoke, less serious permanent : 

disturbance th^n might have been expected from clinical 

studies of a post hoc kind. 

Thjat by the age of 5 y broken hmes apparently do not- 
pTc>)voke more than temporary disturbance (bedwetting) , 
anci this only in non^manual families. »^ . 



T /^rt the proportions of mot^T^rs taking up full or part- 
lae work increased Ws their .children approached the ag€ 
5,. but . that there was jno eVidence that their childrer 
re less emotionally stable at this age. 



15 



Thf early toilet tjraining leads to earlier bowel control, 
bedwetting, and less breakdown later. 

Tuat a high proportion of bedwetters bite their nails 
jtnd have speech defects, difficulties which persist even 
after they become dry. 

That children prematurely born are more vulnerable 
physically during their first two years but not after- 
wards; that "although rather smaller than normal children 
in later childhood, this is not .the result of prematurity, 
and that by the age of 8 they tend to be handicapped, 
in mental ability, particularly in reading. (Wall & ' 
Williams, 1970, pp. 42-43) 

■■ ■■ ■ V , ' ' . 

2.6 Education and Its Impact ^ 

For the: sake of better allocation of .scarce resources to education, 
it is reasonable to learn how education, affects academic achievement, and 
subsequently, earnings. We need to know what the most^ effective elements 
of the education process are, how they work, and how they affect the 
individual's intellectual and. economic development. The impact question 
is especially relevant to novel programs designed to. overcame disadvantages 
'underi which some social groups labor, i.e., designed to introduce more 
equity' into the social system jjthrough education. 

Most research designed to get at these. issues begins with.cross- 
sectiooal surveys and even a brief incursion into history shows that 
these have been useful despite the limitations of the cross-sectional 
approach. Abraham Flexner's 1910 report, of his studies of medical schools 
in the United States 'relied solely on this methodology and on Flexner s 
standards of performance to produce a major reformation in medical train- 
ing The Thorndike and" Ayres studies of school record systems were 
siniilarly useful in moving schools toward higher quality (though still 
imperfect) record-keeping practices (Goslin & Bordier, 1969) The- cross- 
sectional studies have been, and^still are, enormously useful in . this 
context, especially in clarifying the scope. of Educational problems, 
especially where standards of quality are f'ai'rlfy clear. . 

But the difficiilty of making inferehces^about the .impact of 
educa«-ion, o$ understanding individual growth, based on cjroSs-sectional 
data are no less severe in this sector than they are inrthe medical ., 
arena. It is difficult, of teri impossible, to discriminate accurately 
between the influence of background variables and those of the school. . 
'■It is not generally possibleito lay out growth and assay impact without 
at least some longitiidinal data; The scientifit and political traps^., ^ - 
here are exemplified by the current U.S. controversy over busing students 
from the school district iii which they live to another in the interest 
pf fostering equitable, quality education. James Coleman s advocacy 
position five years ago, based largely bn cross-sectional data, is consid- 
erably different' from his opposition now based on longitudinal data 
and demonstration projects.. 
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Better interpretation, .inference, and. prediction are conditional on 
better theories (models) for simulating social behavior and on data 
necessary to support that theoiry. As a consequence of the shortcomings 
of eatl^er data, a number of longitudinal studies have been mounted to 
better understand education impact. We cannot summarize those here— there ' 
are tf-ar too many to do so reasonably. So we content ourselves with examining 
a nice study by Fagerlind. ..'J 

Vrhis analysis of longitudinal data was designed, in part, to clarify 
polar\ views of the results of public investment in education: that dura- 
tion if educational experience has ^ major import on earnings ... a view 
exemplified by Nobel Laureate Paul Samuelson, and Jenci-'s (and ot^nro' ) 
view tihat the impact on earnings of education beyon.' ' ist- ondary- 

level/is marginal. The Fagerlind research manages .to avoid the traps b'f " 
cross-sectional data and of short-term longitudinal 'study, by considering 
individual growth over a 30-year-period, It is an efficient study of a 
well defined social group insofar as it builds on survey data initially 
collected in 1930 on a subpopulation of children in Malnio. It obtains 
both economy and completeness of sampling through, the us^of population 
registries for follow-up .surveys. Accuracy and temporal relevance of data 
are enhanced by relying on archival records—military selection test-scores 
for men, data from tax registries on earnings of the respondent and the 
..respondent's parents,, and data from-census records on (demography, geographic 
and occupational mobl.lity, etc. Fagerlind 'supplemented archival records 
with survey data collected during the 1940's/ 'SOs, 'eOs, and early '70s. 

The product of this particular research is intereseing not only in 
adjudicating polar views: Fa gerlind's data, of higher quality than Jencks', ' 
supports Saauelson's theory. Italso helps to specify the process, the 
mechajiism underlying education's impact on earnings through an otherwise 
tangled mass of\compeClng influences iuch as home,: faMly, and" so on. 
And It has helped to understand shortcomings of competing data and models: 
quality of education, for example, has been ignored in many such analyses 
and this one uncovers strong, plausible linkages between 'quality and earn- 
ings from age .30 onwards. 

; Still, the longitudinal approach used here is only an int-._Im step. 
. It is naturally limited in the extent to which it can be applied iti 
specific, particularly novel settings. . More recent research, for example, 
stresses small longitudinal experiments, mounted alone or in conjunction 
with larger observational studies, to obtain finer appraisals of innovative 
educational programs and. practf,ces. Some, like the Heber . et al. (1972) 
work is dedicated toward inhibi\:ing intellectual deprivation from" infancy. 
Others., like MiddleStart, involve randomized tests of programs designed 
to improve academic perforfflance of adolescents who are unusually deprived by 
virtue of their very poor economic condition. Still other experiments, 
designed to improve medical school education, police training, and man- 
power training. and the like, follow participants through adulthood in the 
interest of obtaining less ambiguous information about the short-term 
impact of expensive and specialized educaUon programs (see .Boruch & 
Riecken, 1975; and Rleckem et al. , 1974). 
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3. Corfeiatlonal Research: Definitions, Justif ication, 
and Relevance to Record Linkage • 

Correlational research refers here to the process of establishing 
Zlllo characteristics of an individual are related to one another The 
average relation., for a large sample of individuals, may be represented, 
in statistical f^rm by a simple correlation coefficient ^^^^e probabil- 
ities in an actuarial table, and so on. For example, to identify the 
relation between level of health status and level of physical ''^^ ivity 
during woVk one might obtain measuro. I^.M. ariables f .om each mcmb.^ 
o? a suitable sample of individual., link uu. two elements of information 
°n each individual., then compute an index of the relation based o , hat. 
Unkage. The correlation may be of descriptive interest alone in th^t it 
iSlect; the existence and strength of a relation between. two variables 
"nay be more important to an individual, in that the correlation helps 
11 n^ldict future health status from current physical exertion levels. 
'Fln^ny. such data make it possible to form tentative ideas about the 
bi'heical n^echanism by which exertion influences health f ^ - .-J'^" 
ber^a),, i.e., to build theory necessary for the development of better 
control of haalth status. 

■ in primriple,. correlational investigation is a general activity of . 
hinh lonSSinal research is aA impoirtant subclass. Both types of 
fesearcru^nrreqS" some form of record linkage, to sustain statistical 
analysis: ^ay are discussed separately here; on account of traditional 
differences in the emphasis of each type, of research. 

■Wr^lational research .of ten requires that the contents of records 
which are'i^Lrained by independent archives be linked. The spec xal 
; functions of Irnkage vary considerably, but most can be grouped into oue 
of the followirng categories: , . 

' , To assess and improve the quality of available data 
from any source; 

To reduce costs, duplication of effort, and respondent 

burden in>Vsurveys ; • ' V - . . ' V 

; \Tq, clarify and enrich the data base; for applied' social 

* research and policy analysis. . ^ 

The illustrations of the benefits and litaitations of linkage are presented 
below using this taxonomy to organize our experience. 

3.1 Assessing the Quality of Data and Improving the Quality 'of^ Data 
Analysis . . ' / ' \ 

"Response validity" refers to the association between an individual's 
. response ?o Inquiry under one set of conditions ^nd his response to 
InqulTunder a sSond set of conditions which ar^ thought to f-xlxtate 
nea^-perfect reporting. Most such studies involve one forn. or another of 
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^^nl.i^1^^ \ "^^""^I '^^^^ o^ income of identifiable respondents may 
be linked to Internal Revenue Service reports, for example, to assay the 
adequacy of the census interview process. Data from in^er^iaw^ S on 
one occasion under normal conditions may In linked fllmilqrly t iater 
aZlilT V 8^"ge the . ..quacy of Jne "nor., interview 

t^M^f ? ^ mech.r^sm for linkage 1. .r.t-lcal for computing qua^l!^ • 
tative indlr-; ot averag-e agreemant between che two types of report^ 

is i«nL'«?^^/^^^'°;^''^"^ ^^^^ for judging the data's credibility, it 
^o„S . J^f ""^^"^"^ '° Statistical analysis, unless of 

f lZt ' ;r I f ''"^^^ ^i'^h a secular act of 

faith. The absemre- of validity statistics is especially crucial hot only 

and ev^^'^f '"'r'-''^''"^ statistics but also in using them' to ^onitof 
more liffl. ^^''T' reporting will usually make It 

Zll ^"^i^"lJ/° d^t^" changes in human status, and in situations where 
data imperfections 30 unrecognized, data analysis' may result in wildly K 
inaccurate conclusions. ' i-cauxL ±n wiia^y 

Examples from descriptive stirvdv rese;>rrh . Many of the better 
validity studies in the uls. have been conducted by {overrent agencies 
and by university-based research groups .V The atudLs^rrirequentiJ^esigned 
ihether'o^ ^"f f^^^^^^"-" to support an aininistrative decision about 
whether/ or not to continue a particular type of inquiry. 

In. health survey research , for example, a good deal of tlVe. 
/use of record linkage is reported in the proceedings of a 
/recent national. confer ence (Reeder et al,. 1975). The 
/ deficiencies in physicians' records, for example.* have 
/ been examined, by matching recor'd content with data from " 

interviews with patients; 'Distortions in reports made 
by physicians to their own medical societies have been 

. investigated by linking those reports with intensive 
- interviews subsequently conducted wxth physicians them- . 
selves. Methods of interview designed'to minimize 
embarrassment in health-delated surveys have been tested 
and evaluated using individuals' hospital records as the 
standard for accuracy. Surveys of health services 

.utilization, necessary for planning such services at the 

national level have Ibeen validated using side studies 
■ which link individual, responses to records maintained by ■ 

providers and third-party payers. \ ' 

Analogous examples appear in manpower reseaich. For ' 
example, to appraise the validity of self-reported " ^ 

^ occupation five years ago."- a question which has 
appeared in many cross-sectiohal manpower sairveys. the 
U.S. Census Burean conducted tests om 2800 'households 

in 1968, for whffim 1963 data on actual occupation were 
available. Despite the use of a variety of methods to 
elicit the retrospective report, the differences between 

;f f""^ ^^"^"^l s'^^tus were in the range / * 

. 23-28^ (JabxTie i, Rotihwell. 1970),. The linkage here. 
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, archival recorc ^nd the^^l?'^ i Burvey, was 
' . ^ , 11 establishing • adity .ate a..l in the pattern, 

ai .y. And the statistics themselves influenced the 

^ ? ns^s Burelu's decision.to drastically reduce t^e 
retrospective question in its own surveys and to routinlze 
the co'rrection'of other survey researchers^ occupational 
mobility statistics. « . ^ 

Housine statistics are no less immune from biasing influences . 
f^f^I^i^Irl^. intensive reinterviews are necessary to^ 
^rlblish validity of initial interviews. For example, it is 
^o uifeLo^^^^^^^ expect that in-rviewers wiU^^^^^^ 
notably in their ability to rate quality of housing. In 
testing alternative methods For assuring accuracy of the 

s^t^n^p:ra:u!:r::throf i^i^^i:'^^ 

that no P";^; ble validity level. And as ..^ consequence^^ 

Srrltin : ~a' dropped ^^?y^ ^~ /. 

lelr Teiiher tS collection o^.valldity statistics nor the 
Again, nexune . . actions would have been possible 

subsequent ^^'"^"f enumerator .r epoirts 

without some mechanism for llnKing xnxt-xd 

with more expert reinterviews. - , /- 

records, hospital recoras, u aiid records from the U.S. 

the 19bU .c^^^"^* . rLr*^ of colleee students were 

oQr-fmates were obtained. Lists or coxxegc _ j 

larles in 1960 census. On the other hand, matching of . 
lanes in CO _ relatively Inaccessible 

. ISrvL^ ^-iStr^f'iSa- e recipient^ postal sWvice 

iisSat^-provide no special encouragement ^^^^^f^^^ „ , - 
^S^na <,Decial lists as a coverage improvement/program. . 
Ho^itz' (See) conducted similar studies in -rai. areas ; 
S".uLested that .20 to 25% ^'^^^J^^^^^, 
■ rates and 15 to. 20% under-reports of birth rates are no 
• . "nusual w^^^^^^ hospital a.d state medical records are used • 

and a standard.- - 
. 1 e -illustrate how validity s'^tatisrics, generated through 

.elimit- the. cred^ility of social survey 



statistics and can serve as a basis for making decisions about the conduct 
of a survey effort. 

The practice of conducting side studies such as these, based on 
limited record linkage, is' practically nonexistent in commercial survey 
off orto. - It is, however, typical in some governmental surveys and in 
research conducted by some university-based research groups. That the 
practice is increasing even in these sectors is evident from the biblio- 
graphies published on the ..topic (notably Scheuren & Colvey, 1975) , from 
new reporting systems sueh as Studies from Interagency Data Linkage 
for describing the products of the work, and other evidence. 

Examples from program evaluations . , Imperfections in either social 
survey data or administrative records make it' difficult to detect and, in 
the worst cases, can produce statistical artifacts which make programs 
appear harmful. Estimates of validity, whether based on record linkage 
ox not, "are often essential for refining the design of an evaluafcion to 
accommodate the problem. . ^ 

hforeAspecif ically, one of- the chronic problems encountered in the 
United States has been the production of biased estimates of program 
effects under some sfpecial but common conditions. -Conventional statis- 
tical. techniques,, such as regressiorT analysis, covariance analysis,, and 
matching, when applied to fallible data bbtained in some observatijDnal • 
evaluations, yield consistently^ biased estlma!tes of program effects, in 
part because imperfect measurement goes "unrecognized. Consider, for 
example, the West inghouse-Ohio evaluations of -"Headstart a preschool 
program for the. economically deptived. the. initial evaluation relied. on 
a textbook application of covariance analysis of survey data to eitplain 
how children* s verbal ability varies as a function of demographic 
characteristics of the children and of theit families, and other variables. 
The estimates of the impact' of Headstart were actually negative, implying 
that the program had a harmful: effect. It is clear from secondary, analysis 
of the same data tha^if one adjusts the conventional analysis so as to 
recognize imperfect measurement, the; program's effect is negligible and 
perhaps even slightly positive (Magidson, Campbell, & Bar now, 1976). 
Similar biases haive been discovered in the evaluation of manpower training 
programs (Direction, 1974) , in the estimation of the imapct o^f special 
medical treatment regimens (James, 1973), and elsewhere (Campbell & Boruch, 
1975).-:-^._t ^ 

- To summarize, we observe that measures^~f"~s~oclali~psychological,.: 
medical, or economic behavior are usually imperfect. . If the imperfections 
go unrecognized, then statistical analysis o£ the impact of programs 
designed to ameliorate relevant problems xd.ll be .insensitive at best, 
misleading at worst. Statistics bearing on validity and reliability of . 
response are necessary for rat ionaT adjustment of conventional statistical 
analysies so as to reduce bias in estimates of ^program impact. Record 
linkage is offen, though not always, necessary for production of the 
necessary information on validity of the observations . 
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The view that administrative records ought to serve as the standard 
against which survey records are judged is, at times, clearly-unjustified. 
Administrative records are tied to administrative action,, and for'that - 
reason, are normally susceptible to a variety pf biases and sources of 
error which do not affect survey data. One of several ways to appraise 
the credibility of statistics based on those records is through special- 
ized designed surveys. 

Prior to 1910, for example, studies by the noted educational researche 
E. L. Thorndike on the adequacy of school records led to major reforms in. 
school record-keeping practices. Those studies relied partly on record 
linkage to furnish evidence concerning deficiencies in "exist ing recoi:d 
systems (Goslin & Bordier, 1969). Later studies, conducted by economists, 
contributed' to what we now know about needs for record accuracy, publicity, 
and adequacy in preventing abuse of power, by public utilities (see Shils, 
1.938). More recently, Campbell (1975) and others have tried to enumerate 
more fully the reasons for corruption of administrative records and to 
develop some crude theory to account for the phenoinena. Most of the theory 
building depends in one way or another on the conduct of surveys to 
appraise the quality of an archive's contents. The U«S. Army reporting 
system for drug kb uses, for example, were assessed during the early 1970s 
using an experimental interview method which generally yieFds less 
distorted information on actual abuse by identified individuals (see 
Section 5). The debatable quali ty of criminal r"ecords maintained by 
• police has led to Federally- funded victimization surveys i conducted by 
the Census Bureau to determine the nature and- incidence Of unreported . 
crime, the elasticity in police definitions of crime, and so on. These 
more recent^^^ples do not depend on record linkage to make their point. 
But whether a social scientific survey can be mounted to^ verify the 
quality 6f an archival record system depends heavily on administrative 
endorsement of the idea that multiple indicators of a period that are \^ 
desirable. As the practice of conducting this kind of study increases, 
the need for more depth of inquiry , and, conigequently, linkages between 
archival record and survey record will undoubtedly increase. It is 
often possible to eliminate confidentiality-related problems in this 
context ^y using the insulated data bank .strategy described in Section 
5 below. / • 

3.2 Reducing Costs,/ Duplication of Effort, and Respondent Burden 

Partial duplication of a data collection. effort by several agencies 
may be justified c^ti several grounds. Independent archives which maintain 
some~overlapiring~^inTora^^ example, may be warranted by legisla- 

tion which requi^res independent collection and maintenance of the data, 
they may be justified as a device for periodic cross-validation-of the 
contents of files. Nonetheless, exact or, nearly exact duplication may 
be costly to^the data collection agencies-and to the;, responaent who 
must contribute the time required to supply the information to each 
agency-. •/ . " " , . . ^ ' - , • ■ , ■ \' 

Although .existing archival records have not often been used as a 
basis for evaluating the impact of experimental social programs, they do^ 



have some promise In this regard* The argument that archival records 
can be used to mount more economical and more Informative ^evaluations of 
social programs has been advanced persuasively . by the Committee on Federal 
Program Evaluation of the National Academy of Sciences. We quote 
verbatim from that report: 

• / _ • • . ..; \ 

Once the major administrative archives of government. 
Insurance companies, hospitals, etc., are organized and 
staffed for such research, the "amount of Interpretable 
outcome data on ameliorative programs can be increased 
tenfold. For example, Fisher (1972) reports on the use 
of Income tax^data in a -followup on the effectiveness of 
manpower training programs.' While these data are not 
perfect or complete for the evaluation of such a training 
program, they are highly relevant. Claims on unemployment 
compensation and welfare payments would also be relevant. : 
Cost is an Important advantage. Using a different approach, 
Heller (1972y reports retrieval costs of . $1 per person for 
a study of several thousand trainees. Even if $10 were ' 
more realistic, these costs are to be compared with costs 
of $100 or more per interview in individual fdllowup inter- 
views with- ex-trainees. Rate of retrieval is another' 
potential advantage. Followup' interviews in urban man- 
power training programs have failed to locate as many as 
50% of the population, * and 30% :loss rates- would be common. 
, Differential loss rates for experimental and control'^groups 
^ are also common, with the control groups less motivated to <; 

continue. In the New Jersey Negative Income Tax* Experiment , 
. over three years, 25.3% of the controls Vere lost, compared\ 
with a loss of only 6.5% of those in the most remunierative \ 
experimental ' condition. While retrieval rates overall . 
might be no higher for withholding tax records, the differ- 
ential bias /in cooperation would probably be avoided, and 
the absence of data could be interpreted, with caution, 
as the absence of: such (Bamings. " (Campbell et al. , 1975) 

It takes little imagination to see how relying on ..existing^ar chival- 
data can reduce the expense of a program e\^ It is quite another, 

matter to empldysuch records creatively in difficult research settings. 
One of. the more clever applications of archival data stems from an effort 
by Robertson and others (1972) to evaluate the impact of TV messages which 
encourage drivers to wear their seat belts: 

In some recent tests, four different types of TV messages were * 
broadcast over four different TV cables, each cable serving a 
random set of households within a-large region. The research 
objective was. to determine which TV or broadcast fostered the 
highest rate of seat belt usage. To evaluate usage, the 
researchers first observed whether or not drivers in the 
region wore seat belts as they stopped for lights at randomly^ 
selected intersections. To link actual usage with area of 



residence. I.e., with TV message type, some mechanism for 
identifying each driver's residence was necessary. Rather 
than question each driver, the researchers merely recorded 
auto license numbers and employed State Motor Vehicle archives 
to identify the driver's area of residence. Once each driver s 
residence and seat belt use were linked, it was an easy matter 
to compare the crude effects of alternative TV messages on use. 

Some examples of the savings engendered by temporary and • 
linkage of. governmental records have been documented by Hansen and Hatgls 
(1966), In these cases, a sample of records maintained i"^^ A 
the U.S. Census Bureau, by the Internal Revenue' Service, and by the Soc^l 
Security Administration were linked to determine how costs of surveys 
might be reduced. 

Prior to 1954, for example, the Economic Census of manufac- 
turing, retail, and other industries was conducted by field 
interview survey with some larger firms canvassed by mail. 
In the interest of reducing costs markedly, mail survey was 
considered as an alternative to expensive field interview 
surveys. At that time, the Census had ho mechanism for 
construction and maintenance of up-to-date mailing lists, , 
however. Such mailing lists were maintained by Internal . ^ 
Revenue Service and Social Security Files, based on payroll , 
tax records, and with some modification, the basic lists were^ 
checked for validity,, then adopted by the Census Bureau as a 

basis for the mail survey- in the economic census. To obtain 
data on the retail industry, conventional Internal Revenue . 
Service forms were modified slightly, making it possible to 
eliminate any additional mail or interview surveys of this 
industry by the Census Bureau. More than $6 million were 
saved by . employing this last strategy. .. „ - 

^ Similar savings were said to have been realized in the 

1967 Economic Census where, for example, modifications to ^ 
Internal Revenue Service schedules permitted use of these forms 
to elicit' necessary information, and small direct interview 
samples were adjoined to this effort to obtain necessary data 
on products,- mirchandise lines, and so forth. Finally, admin- 
istrative records, from the Social Security Administration and , 
from the Census) have been Used to construct mailing and 
sampling lists economically for Bureau data collection programs 
and to avoid duplicating the cSllection of information. 

3.3 Clarifying and Enriching Statistical Data for Policy Analysis 
and Applied Social Research 

By Clarification here we mean obtaining. a better understanding of 
the meaning, nature, and limitations of a particular social statistic, . • 
"Employment rate." for, example, is a deceptively jlmple label for a 
chafacteristic. which is complex in origin.. Clarification often Implies 
ah additional objective, that of enriching the data resource with respect 
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to number and kind of data archived, for the sake of higher quality 
analysis. Improving the interpretability and analyzability of a data • 
set can be accomplished in a variety of ways. Linking of multiple data 
sources for statistical purposes is one method of doing so. Note,« 
however, that linkage of all individual records.may not be essential; 
linking a (random) sample of records is often sufficient for this purpose. 

To be concrete, consider that ^n the United States, the Intemal'Ifevenue 
Service, the Social Security Administration, and the Census Bureau each 
independently collect data on annual income from citizens. The separation 
of effort is .related to differences in the various ag;ehcy functions. Two 
of the Social Secwrity Administration's primary missions, ior example, 
are understanding^ncome- redistribution at present and estimating the 
Impact of redistribution policy in the future. Most U.S. citizens are 
required to pay a social security tax based in part on gross Income, but 
Federal employees often do not choose to enroll in the national Social 
Security plan and so their incomes are not on file in SSA record systems. 
The Internal Revenue Service directs its , attention at a different but 
overlapping universe, the tax-paying public^ Ifc has a different function, 
taxation, and it defines income differently, notably in terms of "taxable 
income." The U.S. Census Bureau's definition of income differs from each 
of the other agencies' definitions because its function is unique- 
statistical description of the state* of the population — and be.cause 
there are se.vere limitations on the way in which .census data can be 
collected. 

The result of these differences in definition of income, universe, 
and in function is that the relationships among these various sources 
pf data on "income" have not been well understood. The economist using 
one source of data to predict the impact of a new health insurance policy 
might well develop projections whichdiffer notably from projections 
made by an economist using another source of very similar information. 
The discrepancy among Sources is marked in particular cases, and it is 
reasonable to use record linkage to bring some order out of this confusion. 

To accommodate the problem, a massive Federal effort to 
reconcile conceptual differences among record contents has 
been mounted jointly by the U.S. Census Bureau, the Social 
Security Administration, and the Internal Revenue Service. - ^ 
The relevant data base Includes the" 

Population Survey and administrative records from IRS and 
SSA files. The reconciliation has three immediate .* 
purposes: to understand the relationships among ostensibly 
identical categories of information maintained by each 
agency, to input resultant data into the SSA simulation 
models of the tax transfer system, and to assess relative, 
biases in nn'A3.:8 f^^^tistics. The reconciliation Involves 
linking a >>> \ sample of records on individuals from ' ' 

the variou ^ .^rckr not linkage of the entire data bases . 
Preliminary re? al of the study reported by Herrj lit and 
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Spiers (1975) suggest that census statistics on income are 
quite reliable. for salaried employees and regular wage 
earners; the overlap between Census reports contents is 
about 96%. Income reports of the self-employed show some- 
• what less accuracy (90% agreement between Census and IRS); 
reports of interest and dividends made to census are 
considerably less reliable (less than 80% agreement) for 
most respondent groups. 

As a result of such research, the models of economic systems employed 
by the U.S. Census Bureau and by the Social Security Administration (SSA) 
can be improved considerably when error rates based on IRS data can be 
recognized. The differential predictability of male and female incomes 
becomes more inter^Jrer^able with evidence on differential accuracy in 
reporting sucli income to Census interviewers. The estimates of the impact 
of training on income become more reliable when corrected for base rate 
errors in reporting that income. And so on. 

r. ■ 

Similar benefits accrue from investigations of the differences in 
count data as a function of archival source. 

A study by Cobleigh and Alvey (1975), for example, shows ' 
that differences in legally defined coverage of the population 
by Census and by SSA produce a Census comparable to a universe 
which is about 94% of the SSA taxable earner's listings, 

. Given a comparable universe, reports of average annual 
earnings from the two sources are in remarkable agreement 
except for very low and very high income groups. In the very 

'low categories, SSA data show about 2(^% more wage earners 
than does the Census data; in the hig^h income categories, 
however, the Census counts are 10-20% .higher than Social 
Security reports* These latter differences are attributed 
by the authors to definitional differences and reporting 
irregularities including self -employment earnings not 
reportable to SSA, rounding error in self-reports to Census, 
late reporting to SSA, and to other factors. 

Another type of enrichment involves the\use of archival records for 
specialized research in which the record, though not disclosable by law 
or -social custom to the social scientist,* represents a key element in 
accomplishing applied research goals. Surrogates for the record may! be 
sought, of course, but in the absence of any suitable substitute, it is 
often possible to capitalize effectively on restricted access records 
without according special privileges to the social scientist. For 
example, one of the pectiliar and persistent tensions ,in our society 
involves the zealous efforts of ythe U.S. Internal Revenue Service to . 
extract legitimate taxes from citizens and some citizeps' equally ^' 
strenuous efforts to avoid paying them. In an effort to clarify the 
conditions under which taxpayers will f ilf ill their responsibility with 
somewhat loss ro i - ::ance Co - least ndissatisf action) , Schwartz and 
Orleans (* '-^^ t -crimehtal tests of those conditions to 
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com*^'. ilative rates of tax payments for a particular category 

lyers were assigned randomly to one cf three advertis- 
i^trategies, the strategies differing in respect to their 
iuasis in justifying payment of taxes. The first condition 
relied heavily on appeals to moral conscience, the second on 
rl^-^ats of punitive legal action, and the third on threats 
c' :cial embarrassment (tax evasion being a matter ;f or 

c legal action). The objective of the experiments was 
termine which types of appeal led to higher rates of 
rticular income. To do so credibly, required that 
c lion or form of appeal be linked with the individual's 

lent reports of income to the Internal Revenue Service, 
aer to link the two kinds of re cords (the researcher ' s 
^crrd of condition and the IRS record of income) without 
..rj^ning IRS rules on disclosure of records (which are 
Ldential by law) and the. researchers' rules concerning " 
:iasure of their own records, a mutually insulated file 
'iiach, described in Section 5, was used^ (The results 
L ^.e experiment are interesting. Middle-income respondents 
: most to the threats of legal action; low-income 
-as respond most to appeals to moral conscience; the 
i . "**inj:qme_g^^^^^ _ 
f^"5^rrassment) . ... ^ 

j:ase for merging separate data- sets into a permanent co'nsolidateii 
pocl aa^ta is based on the assumption that: the poo led. data will be, a 
mc:7'4^ i , sr^mative basis for social research t:han separate files. Examples 
o5t±hr>-^ ire few, however, because the diff-iculty of match-merging files, 
tfe dt' -f^-rences in terminology, and differences in sample design and data 
^IZleA^i M procedures have inhibited many researchers from consolidating 
:f:Tios^ '^^rreover, it is difficult to* anticipate the usefulness of linked 
.f54ie<j; ' out actually trying the idea out on a small sample of records* 
Aiacoj* v.v: large-scale examples, the Wisconsin Assets, and Income Studies 
:Vrc- iv^ Bauman, David, & Miller j 1970) illustrates what can be 
* . ■ 1^5:.hed, however. Researchers appraise the effects of tax averag- 
x.:r- .c ; .sals, changing incomes f rom retirements, capital gains incoc^e, 
a: ^>^o on by simulating changes in tax laws, using the linked records as 
th;i r-'^w material for analysis. Records from the Internal Revenue 
Service, Wisconsin tax records, the Social Security Administration, are 
combined in the file, without jeopardizing privacy of individuals on. 
whom records are kept, to permit this research. The products of the 
research are predictions about the importance of changes in tax laws on 
individual income, strategy ..which attenuates the need to rely solely on 
anecdotal case study, intuition, and- fragmented data as a basis for 
legislation in the tax area. 

The more elaborate and more sensitive merged systems are found in 
the medical arena. Most involve both administrative and research information 
and, because they are recent systems, the 'benefits of pooling both kinds 
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of data are hot yet clear. Nonetheless, good revlei^s of the early 
products of such vork are available for social nied±c±ae, canrmuiiity hfsalth 
services systems, and the like (e.g- Achesoh, 1967). Lasks and Bamk' i 
(1975) descripticni o£ the Jta-kland IniBt-iftxte's ps^mlatrlc znfomaclon 
system is pr^brl-bly onte of thu2 best o: its k±nd, "j:i2S3re is a sttroag 
emphasis on lesgislfl^niive and ciechiiica^ ^afcegmards fcnr assuring the c<iin- 
fidehtiality of the^ records. There is^ a haxd-nosed ^product orientstion: 
„side from common -:^£magraphi c informarlo-a, the system facilitates quality 
:ontrol over treat^sa^nt, time series avi- yses, and projective studi^*^ of 
the incidence and: developmentt of ment< illness, -and permits some uncon- 
trolled studies of the effectiveness :reat^nt.. Perhaps most izn5)ort- 
antly, the system can be coupled neati :o expexinnental tests of alltemative 
treatments to better understand wheth^ . and how. aell the treaonent? work 
(Endicott & Spitzer, 1975.)- ' 



4» P.rrvacy Implications: ?T^.vat:e with Respect to Whom? 

Any longitxadinal research invol\Ass Ixniking observations made om an 
individual (or some other unit of anaHysis) at one point in time wis:h 
observations made at a second point. The a'werage statistical relatiLon ^ 
derived from the constellation of. individual observatioas is, as we've 
said, useful for description at leasts and is often essential for planning 
and evaiuating -social programs, for understanding change in human behavior, 
and for building theory and simulatioia madels. The. linkage is usually 
but not aljways made on the basis of clear identification of the respondent. 
Insofar as the identified respondent does share information about / himself , 
the sharing process may be regarded, in principle, as a depreciation of 
the individual's privacy. That depreciation may be quite innocuous in 
the sense that information disclosed is innocuous; or it may be contro- 
versial, as in longitudinal studies of mental health. 1 • 

Similarly, correlational data analysis must often be based ^bn linkage 
of records from different archives. And if that \inkage is bas^d on 
clear identification contained in ^ each record, then privacy may /be depre- 
ciated in principle here as well. The custodian of an administrative 
archive may, by permitting linkage, violate law at worst or social customs 
at best by disclosing records to a researciier for linkage, however, worth- 
while the purpose of linkage. There may be a similar breach of a promise 
of ODnfidentiality for a researcher who discloses his own records on 
identifiable individuals to an administrative archive, for example, ifi 
order; to verify his records against thoise maintained by the arichive. 

these implications are almost useless in developing general 
strategies for assuring indivitliial privacy. For although disclosure of 
information may represent a depreciation of privacy in principle, the 
fact of the matter is that neither government, nor social or |administra- 
tive science, nor the respondent could get on well without some exchange / 
of information about individuals- Admitting this, the forcus (must, change / 
from absolute assnrance of conf icsrrtiality to balancing social information' 
against the privac^^-related needs of the individual. One approach to , 
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.^htevi.ag that ba-Iamce in a CGrc^ete i^ay is to try Iv. miniiiii2s*2 jiep^Tecia- 
Ion cof ^0 J^ivacy wiithtDut notably ^abridging our abilitry tto colicicr usa^TiingfiU 
^kta on : uman behavior. Doing -Br ,^f;qaires that f^rsst iden^LL—y the- 
^\3£ces f risk in social researci^ , rhen build mechamisms — prr;?:iHidtrrH3_. 
tatistlc al, and Legal — to attenaiite that risk. 

We tracognize, for examples, ttii^t i)rlvacy may be reduced di rectrly 
v*,-tfi rGB, ect to the social scientiirx- In the past, such depreciation 

r*"'^ toee?_ innocuous psirtly because ?;Gir±al research irra^ If has been fairly 
l^uiK)ctuo; s. But as applications of social research ^ social proicllems 
irx^zre^'- , as social scientists iirvest±gate more xnqKZjrtant or more 
CiijfAtrsDVt sial topics^ the attentiuor given to their ±ac/uiries acre ILiksI^ 
trio i\\tcre.,-^e. The import attached -iim relatively n±Bor deprecisitiaa of 
tpravv^c^'^' "ill increase^ And so it beconies tttie social scientist's rrespaaa- 
sir. ^ ro develop mschanisms fcr manimizing the di^epreciatioii of rrrvsqr 

uiirlr /resKBct to t±ie i^searcher. ]>elleve this in 'Spate of..the gsicr thar 
nr sakbstJ^n^tial ris^i^s the respos^isat are usually ^engendered "by surr^^y 
ars-^aroh* The lacii of risk is trnireable to the reKsarcher's Lack of 
fiit: t?r^t in making: personal judgmesnts - about parSricular individuals aira 
li£B : -it^vrest in st:atiH±±cal analysis of the relerg:as)t data^ Identifierrs 
sarnre aitnrely as an accoxcoting djsffjjze^ rather thar as a vehicle ioar 
s Mmrl ^:.rrat±ve action r^>a±nst (csr 3ar that matci^r^or) an individual. 
^RnnseiTitii sss, if identifliers couM sTJimehow be eUlndiaated in the xsgsearch 
xrocsfcssss. or if. the tie '-.between ±fdentifiear-and-rM:*£|^nse-could be nnade 
Tisesl^s for makicLg persnrr:^! judgnnents about ind r,; tduals, without cdams^ging 
jrssssarcj objective neec-x^ssly, ;then we would so. Partial soliiticsrs 
■to. tlia* prroblem of doing fbsc (Section 5) have, been developed partly as 
-rzstzziit of principle, and partly because Tisks o£ disclosure may be 
^^errstHc by persons or sssencies: other than the. researcher. 

It i-s clear, . too, t±mt, fraudulent researcfesr, i,e/, individuals 
pco»s'r::!ig st)cial sccientists, cam rand occasionaHi!^/ do decieve citizens.. 
ITie^ are i^otivated by finartcial gain (e»;g,, saS^sanen posing as pollsters)^ 
by patnoiogical inEluences (e»g^., rapists posing as sairvey ihterviewers, 
OF AC rimes, as pcolicei^n), or by other factors^ In rhe interest of 
: rest^rring the iategrity of the profession and puislic: trust in the sotrlal 
: -leEa:::.^^!:, the social scientist must take .some rsHponsibility for 
rr >te*cri:ng respondents against these infrequent "-rrt imrportant dangers,^ 

Social research records on identifiable indiividuals are often 
ir!rel iwanit for mak±ng adminis trative judgments about those individuals^ 
Re deal in samples ^rather than Ti)iipiilations, and idiosyncratic ones az 
trsat* -fe deal wit5i. infoxmation -diich is usually izrotf at the correct level 
ox relei^ce or detailL for administrative use. Tlhis partial relevamce 
of research records orr^ individua^ls usually seirves as an imhibit ion against 
th& appropriation of Trscords f or -rrrmresearch" purpoiies, iNoaietlieless , 
appnnprikJtion can and does occur. It may emerge under legal mamiate as 
it bias in: the United :S::^tes where^. in a few instanises, research rrecords 
hAve subpoenaed 5Dcr use in j>adicial investigation of particiatlar 

sur^rey respondents. ^ni^loitatlOTi Tnay occur under legal traditions which 
are qijj^ie arbitrary aitri at times border on the capricious, as in ssame 



Congressiomal inwestigating committee - orry. Or, the exploitation 
may be quire Illegal, as in the thefts «\i i £ of research records for 
personal profit or for the purpose oi t ^^crassing :the respondemt. The 
comsequences to the respondent can be Ir is: scKnxal embarrassment, legalL 
sanction, personal discoanfort. The ::ir>i'i5^ '^v^^ces for research are no less 
serious: its inhibition and abrogarimJ, dio^' and im the future. 

These risks are in principle real, IfT :in. pracaiice remote. And so 
they deserve attention too. In partlcu 'afi ix is reasonable to examine 
mechanisms which protect the respondent frr^vs: capricrious action by law 
enforcement agencies, from criminal actltpr 'rsssed oc: the information he 
provides to a researcher, and from atherr ufsnpts tD appropriate research 
records for nonresearch purposes. Tbi^ is. lOTecially true for those easefs 
in which the benefits of the researcn: sxe iicaly to offset greatly th^e 
social benefits of legal appropriation Cj^ records.. 

5. Competing and. Conjoint Appiniache-f 'to Assmrizag Confidentiality 

of Respond* iti Sc. yisl Research^ 

The general implication of the r'^gcecrr.Ttg section is that we take as 
an objective reducing depreciation^c 7ri:?^cy without severe abridgment of 
research goals.. Accommodating this trlnt: rask is difficult but there have 
been a variety of efforts mounted re=^atlly to do so. The major strategic 
approaches can be grouped into three irroaid categories — ^procedural, 
statistical, and law-related-^ — which coTMsider next* This examination 
is brief; details are given in Boruci: (^1^76). 

5.1 Procedural Approaches 

For longitudinal data collected: g%^i:f:zadically within the same frame- 
work, the simple device of using aliaas: Jssntiiiers- is obvious is under- 
utilized. The alias may be created r??:^ r;^ respondent and used tonsistehtly 
in response to permit intrasystem llrni£age* It may be created by social 
scientists, provided to the respondecr^. then purged from the social 
scientists^ files to achieve the same emds. To decentralize the process, 
some neutral brokerage agency (a cehsm bureau.,, a nongovernmental agency) 
may similarly, create an alias for the rsspondenit and destroy its own 
records of any linkage between clear ijisntlfication and alias. 

The strategy has been field testi-^r \a±th sCTie success im U.S.. drug ^ 
studies, political attitude surveys szz^ tbs lilke. Aside from logistical 
problems j its major shortcomings are :rne Ii:±!iiii:£:sticms imposed on linking 
the data elicited tinder alias with any other existing data on individuals. 

- To accommodate some logistical p^roblems as welLl as 'the limitation on 
intersystem linkage,, procedures such ^he llnlc file system have been 
developed. In this technique, a / dicti±imar:vr aif double aliases is created 
by the social ^scientist and given over fotr ssSekeeping to an independent 
agency. The decentralization o^ ttle pnronsa enhances physical security, 
and if the agency is legally entitljsd to ir^esist governmental appropriatiosa 
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° u P -ocedur e is legally secure. dictionary is use-" as 

f„X; r ^ information which Is .ally obtained fr^m 

i^cil^ r • °^ "^^^ ^^^^ it reduces -he 

fnH^ Z''^^"' "^"^ to maintain longltmtHna. records on Identzif ied 

to in^h^rJ H ""^^i^f "^^^ c..ntHinLng ident_ifLers to 

to an arb^trar^-7 shorn: p eriod (see Ast±zi & iJc^ac "i, 1970) . 

linklr ^^^i^^*' "^oris d-i decent archies r=nst be 

vZ^Ut?;. rhf ^"^ ^^''^ d^^v^oi^Hc to permit lin^kags „itb'H,£ 

I^iS fof ^n"^ l^;^goverxiing .^mong the better Iicno.-.> 

systenis for c^z^ so 15s the "mutually Inssul^t-it: file approach.^ used i- 
th.. Schwartz-Or^^ (Z967) study cited e.arL_er. Basically, ti- - tysteir 
involves two fil^s erf rn^oras operated attcer Cerent auspice^' ■ ^-'l 
?nd°i?H T ^^'^ -here is sc»ne ove. .h. between the s^pl^s of 

individuals on .ch the rec:ords are m^taizi^. To accomplish S 
linkage tme firs, axchi^-a (assume it is the social scientist) cr^D -o- 
graphically encodes rhe o^fonnation portion of each record, prodSng a 
new file wrthout mearixng zo any outsider, vt'.ch then transmitter^ L 
the record archive. The archive then :nat^he. tfe ^coded records -.Ith 
its .own records, basad cm the clear id.entif-U^.rs ainirearing in each ~^cord 
Upon completion of t.ce nnatch, identifiers delSed ani tL ^Sk^l 

records are returned =0' the social scient_ ^ho trhen decodes relMt 
prctions of the Mnkad records and coiniucts t±s statistical anariJ?r^3f 
the anonyimous recDrda., .(See also Borunrh, 1^72^ and Campbell et ill; 1975} . 

to cor^^tion°"f ^^"•P^"' ^ ^"'"^ ^-^«-''> vulnerah.. 

to cprruption. Nonetheless, they are usefoil iir. some, but aot all reseatirh 
settings to assure confidentiality of data ^n res^^ect. to the r^JarcS 
and outsiders, and th^y can be tailored to £.==DminiDda£e lonait ^1!'? ^^ 
correlational studies . Their refinement ha. bean uSertaSn b ' fc^^h 

Jwifa^l^™"f ^"'^ ^"'^ f"''""^^ bureaucxacy tz, enhance the procedures' 
flexibility and protection level (Boruch, 1976). Seme of the xefinennents 
depend on statistical app^roaches considered below. . xerinennenL. 

5.2 Staitistical App-roaches • . 

The devices just: described are most xften xele^ant to iP^xc^ irzDersona 
rorms of atoservatiori-qaestlonnaires and rhe Like-rather cha^'to direct 
interview research. And in some instance., th^ lo^gi^tical d-^c 'Ses 
attached to their use ar^ considerable. ^artlv f J t5.ese re^-S. "ii may 
wmTh "P\"°P'^^f ^ tc .as^italize on one ^ the .^:3.^ical TtSte ^les 
which have been develeraed to reduce depreciatior in ^rivac^. A v^et-; 

^PP'^oacfes e:r.-ists and these may be use^t. ^ione or k conjxnxction 
wath the procedural dendces- j—^<.>.j-jii 

- The raest aenswn cilas& of approaches is the ran.d£omized --«-^on-fi 
tactic currently- under test and development ±7 .Cfi-enfeerg in ^hT' iSec 

Sweden, Waraier^ it, C-anads, JfoorS in Holland:, .and «-!f±ers. In ~he 3±rv- 
variation of the appraiari., the social scient-=it ST/imultaneouslv r-'ssants' 
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a sensitive inquir\ en individual, e.g., "Did you cheat cm your inziotLvz 
taxes irhis y e£ and an Insensitive one. e.g., "Do you pre£er potaroes 
over noodles?" T!tie LndivLdual is taem instructed to roll a die anc£ to 
respiDnd to the ff luestzlon if a ama or t:wo shows up, aad to the seconid 
question if a c-hree, z dut. five, or sis shows. He is also tiold to reiradLn 
from giving rhf^ Ininerviewer any indication of which question was 
amswer^d, Wnez i J process is carried out on two large saimples of 
indiviiduals ar, imstri-ictions are followed by the respondeat^ it i:^ 

picssible to - ^^ — -:"p the -roportion :f individuals in the sample who tiar^^e 
cheated on tn^z. zncame: ciix forms an the proportion who prefer noadles. 
In parricular, zrL some simple laM^i of probability, the odds on answering 
ons or the othsr" ruiestioG, the odds m answering one or the other 
question, and ziivn observed proport:.j:: of Yes responses, the estima;rtioG 
is a matter of ^inn^ Ie algebra. 

The techni'i ue permits uis t . -:3t^lish the statistical character 
Dt sensitive prnrHHtrties of grouir-i: cf ±ndividuals. And moreover, i t does 
so without disci.. ^ij^ng to the soiii. :1 scientist any iaf ormatlon abou:r a 
iparti-cular indivi- u-ial. It has neiia field tested in drug ss:ud lea , iiu 
fertility (control studies and other areas., and those tests contiTiue im 
theU.S., Canada^ ^^weden, and eian^-^her-e. The basic method is ibeii^.g 
refined to make _t. more efficient in statistical sense, more acczeptable 
to the respondenct ±n asocial psvc toioxgical sense, asid less volnermble 
to corruption ±vr. a legal sense. 

A separate cLass of approaches 1h based om aggregation of response. 
Tkt2~itG3.viaual asked not to respaad individually to each of a set of 
questions but Ml respond in aggregsrted form to the set. In particular 
variations, for example, the respondent may add up numercial values 
correspondimg rc each answer of eazh question in a set. If "Yes*' is 
assigned a ;.-sJlue of 1 and ""No" a vralae of ^1, for example, the answer 
prrDvided CD a set of 10 q^mestions ^aclx ainswerable with a Yes or 390 is s 
single Titnnber w^oss .penmlssible range -10 to +10. If numerical 
assignimenV: Is varied from one sample to :zhe next, ome needs only a little, 
algebra — notiiM^ :!ns:thods r^r solving a system of simmltaueous equsi^ilons — 
to estfanaxe riiirH: proportions of indivrrmils in the total sample who have 
eacii of -the i v rprGperties. 

AiP-;:: f rr^ uraci tecitmique piermits oner: to elicit even seesitive liirairma-- 
tiGnn in. dlrecc tintEirview sittuations ^Thotit any deteorministic Hinikage 
berweeiT: a,.: Ide^nrifisd respsnnse to thn researcher's qtaestion anid the 
acruaX sidt'iis uf the individual. Wtxh sotme technical improvements, it 
pnpbailkl'' cian b-i applied to some longitudicial studies in which average. 
reiLatldt*!.- assnri^g properties are essenitial. 

Thii -hixc and final class of ^statistical techniques which; has 
received some .attention is aggregarrran of the sample. The techniqroie 
requires that rrr-ie obtain dara not era single identified individuals 
but rather on -.;^ry small ^tct^. caref^ully constructed clusters of lndiv?i deists;. 
If 'z\x^ cluster compositl^ remn:.r,rs th^^, same over rime, eacfo c?;v.ste< 
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can, under: certaim cGnditlonis, be reganded as a synt-hetic person, a 
composite of al-. trie properties of thB. smaU. s^et of individuals it 
comprises. SonsEt imfcormative data analyses ran, be ccmducted on those 
aggregates and-, in&ofeir as aiggre^ation lielps to assutre anonymity of 
individual respcmse^ :there is no depreriatiig?ii -of individmal privacy. 

The ap*plirrsric:as of sancple usicroaggregaticni have so far been limited 
to econoni::!.^ re55«are::i on coinmerci.al units* IBamks, for exannple, may be 
reluctant r. i rr^lea^tie iriformatid?!! aiijout their" operations to any outside 
6:conomist. Tte^ ar^e. wiULing, hejwewer., to hss^e. trhe social scientist 
ajialyze aggre^css? cif bamks in tthe Interest: of irecionclllmg bank privacy 
with futures r52^3:rrrli. And. indieed, a majcnr :syst::em of data maintenance 
and dlsseninatiiOTn ha^ been baiilit aip on thi.s therme by the lMi'«^ersi>ty of 
Wisc:onsin (aee Eaiimiai::, David, & Miller, 1970)* 

5.3 Approaches Bsised on LaTw and Govemmesit Practice 

The final (xlass of apprcnaicfaes to f aciHIitatrng the privacy of the 
respondent in BQixrial researcii conicerns f cmaal Isgal action by legislators, 
the courts^ or g^^vemmeniral eaaecntive agemcies. Such action is taken to 
assure than "when: identif iabLs data musst be collesited for research purposes, 
the data •zlXl. noTti be used f orr ipurposes other thsa research. As a practical 
matter, nhis meains ncrir only stnrengthearLng legal sanctions against criminal 
apprQprlait±on of research records, bu:t also defining bounds on governmental 
appropris^rlon of records* Tlie actions are tasesn to reduce the likelihood 
that rese=arch records on idenclflaSrlfe Indivicffiiais will be nsied to depre- 
ciate -prfivacy a^y 3nore thaiL is normally requ:irred by research and to 
isoiate : . nat reseaxcbi agafnist temporary threats,, legal or' otherwise, wh-en 
the potsssTtiaL benefit-s of research justifies this course of actiom. The 
forms whr'^rh ^i^cK procection nuay take varw carasiderably, and so we describe 
only £1 fevy/ ^cereotrypes here. 

In soTne the HmitHd Statese,^ pubH^lc ofS;criaI:s sucJi as the governor 
are ^^5npcKv.i.;r;d %y the state constitruticsBi or Ofwr legislative act to offer 
testiimatiisil pr±vil^t:. tO' a social resesrrrh^r..^ Hiat: privilege entitles 
ithe Tec±rii int to HegaHly resist any lf=^l eSf.Trt to appropriate his 
records or: identiffiabliS: individuals. HEm u::^r3^t of ,app: opriLation may ^ 
stsnii from a prosecutmr'^s idea that he nsy uise :sven an- uGvilling researcher 
as a drinrl-ial inv^estigator. It may stesn from asrbitrary exercise of 
sucrp^'i^ena power by legi^slatures or the cauirts. In order to legally assure 
that data ^ill ncrt be so appropriated, snssi coirsequently to Increase the 
likelihand that ±ndiv:ixiuals will cobpersnrre in i:he research, a governor 
nnay then provide testimonial privilege sjn am ad hoc basis. To take a ' 
specific example, the governor o 5 Vermomt gave such privilege to 
tesearciaers and respomdents who p^articipated ±xl roadside Surveys of 
drivers- The suirvey objectives v#.fe e.stlflWate the propdrTtion of 
*drinsicixi:g drivisar^ Cblog?dt<^dsts wetfre glvetj sro) i&rWtrs) avd ttt« privilege 

e^Santial isn ge£^.ti¥ig hi^h «ooper«=?s*32cp ratt-e* E&riviers teu6 were legally 
lUcOXAcated \f^^ dx 'uV^ ]2««afe by a policrsBsm., No recOTd. orrf any identlfljed 
indiv^idtoi's cCTSTi^^ian. .^t^as lodged witk^sssy laiw enf orcemenit agency or 
ottier gDvemment:aarcniT??e, though drivers wosaxld normally be prosecuted 
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under cfee law. 

- - This sort of privilege can be applied im sp&cial cases where 
poteatial benefits of the strrvey are high and the relevant government 
executive ±3 weU enooigh infioraned fco recognise the feet. However we. 
cannot always xely on expected benefits of researrch, for although 'some 
research may be important, it may also be risky wath respect to its - 
payofi. Nor can we always relj on the good offices of the public 
official, for the awarding of such privilege is discretionary and political 
tacto-s may argue against in. In amy event, discretionary privilege may be 
as susceptible to abuse from the naive researcher, just as it has been 
atused occasional ly-T>y some governiment execoEives. 

Judicfal discretion! is asuather porteiitial source of support for 
social sclsEitists who, having collected Meatiflable data and having 
established a need for Its maintenaiice, wish to secure it against non- 
research uses. In some .cases, it has Ibeen possible for the scientist to 
legally resist, a court-issued subpoena on grounds that the disclosure of 
Identified records to thecourc would baidly disa2)le a major research 
effort. Evidence that breaches of confldentiaMty can be harmful to- 
research efforts is readily availaijle and cam be effectively to - 

:Bhow caaise why the records sfaoulci not be uised exce^ft in anonymous form. 
Ire fact, a similar line of argmnent has been iused in a case involving 
the Negative Income TEax Experiments in Mew Jersey: The suspicion of 
fraud amom^ people who . happened to participate in the research led to" a 
grand jury Inv -stigarlon and saispoena of Eese^rch records on identified 
individuals . 

Judicial discretdon, like executive discrretion, is by definition 
a hit arbitrary at best, and swiLdly unpredlictsable at worst. So its 
usefulness in pripteating taie confidentiality oif diata is not especially 
promising. ■ . 

Legislative action in tSas fonm of conmrnaixs law is both feasible 
and,, from the pmtnt of view csf umiformity, ve^j desirable. In particular 
it iis possible to build Imv to gprant testfmonJlal privilege to legitin&te ' 
social :^cientists under well defined conditions and uniformly applied 
criteria. It is also possible to build into sucTi law sanctions against 
the fraudulent researcher or the corrupt social scientist or the public 
offl^crm who might attfempt to ap-propriate research data for research 

TMe 197G 7)rug Abuse Act and the 1970 Alcohol AT^use Acts, for 
examp'le, each carry ;i statute whiLch -permits the il^tomey General to " 
accc-grd i^rivilege to soclai scient'ists w4io are fmrnied by the government 
to condu»ct research on these ropucs. Uindeir CHie Public Health Act, persons 
engaged m research on mental health, incluidiEng the use of alcohol and 
other proactive crugs, can be accorded prlwiiksge by the Secretary of— ' 
Health, Hducatitoh, and Welfare to protect the: -privacy of individuals 
who are sulFfectts of euich researt:h. 
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These are new laws, enacted specifically to assure the confidential- 
ity of social research records on Identifiable Individuals. They represent 
a delliaitatlon of power on governmental access to social research records, 
and a delimitation of the conditions under which the researcher may act. 
They represent a spirit of support for the social sciences as well as an 
appreciation for the negative Impact which even legal appropriation of 
research records may exert_jon policy-relevant research. At least one 
such law has been tested by the courts, and it's, intent has been reaffirmed 
in that arena as well. 
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Footnotes 
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1. Background research for this paper was Supported under a contract 
(NIE-C-74-0115) with the National Institute of Education. ^ 

2. For empirical data, theory, and poldxy Implications of longltudliial 
studies In development, see Schal/ (1965) , Wohlwlll (1969, 1970)4 ; 
and Magnusson; Duner, and Zetter/lom (1975). The results of a variety 

such studies in Britain, the United States, France, and Germany a^e 

summarized by Wall and Williams (1970). ' ^ 



\ 
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3. Cohort ef^fects have been recognized only recently by commercial. \ 
market r^^searchers as an /important variable in predicting and explain- 
ing the;/deinand for certain consumer ?,oods. Systematic cohort varia^pion 
in vhai is regarded as/a luxury item, for example, has some Importaj^t 
iraplliations for planning the allocation. of an industry's manufacturing 
resources (see Buslness_Week, January 12, L976, pp.' 74-78). 

■ ( ■ ■-/■■'■ ■ " ■ ' ■ . 1. ■ - ' ■■ ■ 

4. Conducting special social surveys to assess the quality of routinely ' 
Issued ■governmental statistics is not a new idea. Neither is govern- 
ment's attempt' to» suppress the results of special surveys novel. ' See 
Boruch (1976) for a review of suppression efforts jat the local, 
regional, and"national level. 



5, For a detailed examination of the benefits, shortcomings, vulnerability, 
and legal implications of some of these strategies, see Boruch (1974), 
and Campbell , Boruch , Schwartz and Steinberg ..(19,j75) . L . — 

6. This list contains some items which have not beeJ specifically cited 
-in the text. , 
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